3 research outputs found

    Fractional norms and quasinorms do not help to overcome the curse of dimensionality

    Full text link
    The curse of dimensionality causes the well-known and widely discussed problems for machine learning methods. There is a hypothesis that using of the Manhattan distance and even fractional quasinorms lp (for p less than 1) can help to overcome the curse of dimensionality in classification problems. In this study, we systematically test this hypothesis. We confirm that fractional quasinorms have a greater relative contrast or coefficient of variation than the Euclidean norm l2, but we also demonstrate that the distance concentration shows qualitatively the same behaviour for all tested norms and quasinorms and the difference between them decays as dimension tends to infinity. Estimation of classification quality for kNN based on different norms and quasinorms shows that a greater relative contrast does not mean better classifier performance and the worst performance for different databases was shown by different norms (quasinorms). A systematic comparison shows that the difference of the performance of kNN based on lp for p=2, 1, and 0.5 is statistically insignificant

    Developing Machine Learning-Based Algorithms: Classification and Regression

    Full text link
    The curse of dimensionality causes well-known and widely discussed problems for machine learning methods. There is a hypothesis that using the Manhattan distance and even fractional lp quasinorms (for p less than 1) can help to overcome this curse in classification problems. In this thesis (chapter 3), this hypothesis is systematically tested. We demonstrated that fractional norms and quasinorms do not help to overcome the curse of dimensionality. A second strand of the thesis is to investigate a series of linear regression models based on different loss functions, in order to analyse the robustness of the coefficients across all models under consideration. This led us to propose a new, robust Piecewise Quadratic Sub Quadratic (PQSQ) regression model (chapter 4). The proposed method combines the advantages of the PQSQ-L1 and PQSQ-L2 loss functions, which yield the proposed PQSQ-Huber method. The thesis also includes the investigation of linear regression models in the presence of multicollinearity and outliers in a dataset. The Ordinary Least Squares (OLS) estimator is unstable and displays a large variance of coefficients, or its solution may even not exist. Thus, several regularisation methods, including ridge regression (RR), can reduce the variance of the OLS coefficients, at the cost of introducing some bias. However, ridge regression is also based on the minimization of a quadratic loss function, which is sensitive to outliers. As a result, we proposed novel robust ridge regression estimators based on the PQSQ function (chapter 5).</p

    Kibria–Lukman-Type Estimator for Regularization and Variable Selection with Application to Cancer Data

    No full text
    Following the idea presented with regard to the elastic-net and Liu-LASSO estimators, we proposed a new penalized estimator based on the Kibria–Lukman estimator with L1-norms to perform both regularization and variable selection. We defined the coordinate descent algorithm for the new estimator and compared its performance with those of some existing machine learning techniques, such as the least absolute shrinkage and selection operator (LASSO), the elastic-net, Liu-LASSO, the GO estimator and the ridge estimator, through simulation studies and real-life applications in terms of test mean squared error (TMSE), coefficient mean squared error (βMSE), false-positive (FP) coefficients and false-negative (FN) coefficients. Our results revealed that the new penalized estimator performs well for both the simulated low- and high-dimensional data in simulations. Also, the two real-life results show that the new method predicts the target variable better than the existing ones using the test RMSE metric
    corecore